Sorted Sliding Window Compression

نویسنده

  • Ulrich Gräf
چکیده

Sorted Sliding Window Compression (SSWC) uses a new model (Sorted Sliding Window Model | SSWM) to encode strings e cient, which appear again while encoding a symbol sequence. The SSWM holds statistics of all strings up to certain length k in a sliding window of size n (the sliding window is de ned like in lz77). The compression program can use the SSWM to determine if the string of the next symbols are already contained in the sliding window and returns the length of match. SSWM gives directly statistics (borders of subinterval in an interval) for use in entropy encoding methods like Arithmetic Coding or Dense Coding [Gra97]. For a given number in an interval and the string length the SSWM gives back the corresponding string which is used in decompressing. After an encoding (decoding) step the model is updated with the just encoded (decoded) characters. The Model sorts all string starting points in the sliding window lexicographically. A simple way to implement the SSWM is by exhaustive search in the sliding window. An implementation with a B-tree together with special binary searches is used here. SSWC is a simple compression scheme, which uses this new model to evaluate its properties. It looks on the next characters to encode and determines the longest match with the SSWM. If the match is smaller than 2, the character is encoded. Otherwise the length and the subinterval of the string are encoded. The length values are encoded together with the single characters by using the same adaptive frequency model. Additionally some rules are used to reduce the matching length if the code length get worse. Encoding of frequencies and intervals is done with Dense Coding. SSWC is in average better than gzip [Gai93] on the Calgary corpus: 0:2 0:5 bits-per-byte better on most les and at most 0:03 bits-per-byte worse (progc and progl). This proves the quality and gives con dence in the usability of SSWM as a new building block in models for compression. SSWM has O(log k) computing complexity on all operations and needs O(n) space. SSWM can be used to implement PPM or Markov models in limited space environments because it holds all necessary informations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LZW Data Compression on Large Scale and Extreme Distributed Systems

Results on the parallel complexity of Lempel-Ziv data compression suggest that the sliding window method is more suitable than the LZW technique on shared memory parallel machines. When instead we address the practical goal of designing distributed algorithms with low communication cost, sliding window compression does not seem to guarantee robustness if we scale up the system. The possibility ...

متن کامل

The Imaginary Sliding Window As a New Data Structure for Adaptive Algorithms

Abstract.1 The scheme of the sliding window is known in Information Theory, Computer Science, the problem of predicting and in stastistics. Let a source with unknown statistics generate some word . . . x−1x0x1x2 . . . in some alphabet A. For every moment t, t = . . . −1, 0, 1, . . ., one stores the word (”window”) xt−wxt−w+1 . . . xt−1 where w,w ≥ 1, is called ”window length”. In the theory of ...

متن کامل

Dictionary Compression on the PRAM

Parallel algorithms for lossless data compression via dictionary compression using optimal, longest fragment rst (LFF), and greedy parsing strategies are described. Dictionary compression removes redundancy by replacing substrings of the input by references to strings stored in a dictionary. Given a static dictionary stored as a su x tree, we present a CREW PRAM algorithm for optimal compressio...

متن کامل

FDiBC: A Novel Fraud Detection Method in Bank Club based on Sliding Time and Scores Window

One of the recent strategies for increasing the customer’s loyalty in banking industry is the use of customers’ club system. In this system, customers receive scores on the basis of financial and club activities they are performing, and due to the achieved points, they get credits from the bank. In addition, by the advent of new technologies, fraud is growing in banking domain as well. Therefor...

متن کامل

Multipath Communication with Finite Sliding Window Network Coding for Ultra-Reliability and Low Latency

We use random linear network coding (RLNC) based scheme for multipath communication in the presence of lossy links with different delay characteristics to obtain ultra-reliability and low latency. A sliding window version of RLNC is proposed where the coded packets are generated using packets in a window size and are inserted among systematic packets in different paths. The packets are schedule...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999